Discarding impossible events from statistical language models
نویسندگان
چکیده
This paper describes a method for detecting impossible bigrams from a space of V 2 bigrams where V is the size of the vocabulary. The idea is to discard all the ungrammatical events which are impossible in a well written text and consequently to expect an improvement of the language model. We expect also, in speech recognition, to reduce the complexity of the search algorithm by making less comparisons. To achieve that, we extract the impossible bigrams by using automatic rules. These rules are based on grammatical classes. The biclass associations which are ungrammatical are detected and all the corresponding bigrams are analyzed and set as possible or impossible events. As, in natural language, grammatical rules can have exceptions, we decided to manage for each of the retrieved rules an exception list.
منابع مشابه
Language Model Adaptation for Automatic Speech Recognition and Statistical Machine Translation
Language modeling is critical and indispensable for many natural language applications such as automatic speech recognition and machine translation. Due to the complexity of natural language grammars, it is almost impossible to construct language models by a set of linguistic rules; therefore statistical techniques have been dominant for language modeling over the last few decades. All statisti...
متن کاملStatistical Model Checking for Cyber-Physical Systems
Statistical Model Checking is useful in situations where it is either inconvenient or impossible to build a concise representation of the global transition relation. This happens frequently with cyberphysical systems: Two examples are verifying Stateflow-Simulink models and in reasoning about biochemical reactions in Systems Biology. The main problem with Statistical Model Checking is caused by...
متن کاملAnalyzing the function of Quranic language from the viewpoint of Alame Tabatabie
realm of Quranic language, which from among Alame Tabatabiechr('39')s is the most comprehensive. He believes that the Quranic language is a mixture of various languages. The language of some of the Quranchr('39')s propositions is declarative and describes objective events – both tangible and intangible; five groups of Quranic verses are as stated below: Naturalistic verses: describe natural e...
متن کاملUsing Sentence-Level LSTM Language Models for Script Inference
There is a small but growing body of research on statistical scripts, models of event sequences that allow probabilistic inference of implicit events from documents. These systems operate on structured verb-argument events produced by an NLP pipeline. We compare these systems with recent Recurrent Neural Net models that directly operate on raw tokens to predict sentences, finding the latter to ...
متن کاملState Space Realization Theorems For Data Mining
In this paper, we consider formal series associated with events, profiles derived from events, and statistical models that make predictions about events. We prove theorems about realizations for these formal series using the language and tools of Hopf algebras.
متن کامل